1,746 research outputs found
The discriminant power of RNA features for pre-miRNA recognition
Computational discovery of microRNAs (miRNA) is based on pre-determined sets
of features from miRNA precursors (pre-miRNA). These feature sets used by
current tools for pre-miRNA recognition differ in construction and dimension.
Some feature sets are composed of sequence-structure patterns commonly found in
pre-miRNAs, while others are a combination of more sophisticated RNA features.
Current tools achieve similar predictive performance even though the feature
sets used - and their computational cost - differ widely. In this work, we
analyze the discriminant power of seven feature sets, which are used in six
pre-miRNA prediction tools. The analysis is based on the classification
performance achieved with these feature sets for the training algorithms used
in these tools. We also evaluate feature discrimination through the F-score and
feature importance in the induction of random forests. More diverse feature
sets produce classifiers with significantly higher classification performance
compared to feature sets composed only of sequence-structure patterns. However,
small or non-significant differences were found among the estimated
classification performances of classifiers induced using sets with
diversification of features, despite the wide differences in their dimension.
Based on these results, we applied a feature selection method to reduce the
computational cost of computing the feature set, while maintaining discriminant
power. We obtained a lower-dimensional feature set, which achieved a
sensitivity of 90% and a specificity of 95%. Our feature set achieves a
sensitivity and specificity within 0.1% of the maximal values obtained with any
feature set while it is 34x faster to compute. Even compared to another feature
set, which is the computationally least expensive feature set of those from the
literature which perform within 0.1% of the maximal values, it is 34x faster to
compute.Comment: Submitted to BMC Bioinformatics in October 25, 2013. The material to
reproduce the main results from this paper can be downloaded from
http://bioinformatics.rutgers.edu/Static/Software/discriminant.tar.g
Evaluation of noise reduction techniques in the splice junction recognition problem
The Human Genome Project has generated a large amount of sequence data. A number of works are currently concerned with analyzing these data. One of the analyses carried out is the identification of genes' structures on the sequences obtained. As such, one can search for particular signals associated with gene expression. Splice junctions represent a type of signal present on eukaryote genes. Many studies have applied Machine Learning techniques in the recognition of such regions. However, most of the genetic databases are characterized by the presence of noisy data, which can affect the performance of the learning techniques. This paper evaluates the effectiveness of five data pre-processing algorithms in the elimination of noisy instances from two splice junction recognition datasets. After the pre-processing phase, two learning techniques, Decision Trees and Support Vector Machines, are employed in the recognition process
Uma Introdução às Support Vector Machines
This paper presents an introduction to the Support Vector Machines (SVMs), a Machine Learning technique that has received increasing attention in the last years. The SVMs have been applied to several pattern recognition tasks, obtaining results superior to those of other learning techniques in various applications.Neste artigo é apresentada uma introdução às Máquinas de Vetores de Suporte (SVMs, do Inglês Support Vector Machines), técnica de Aprendizado de máquina que vem recebendo crescente atenção nos últimos anos. As SVMs vêm sendo utilizadas em diversas tarefas de reconhecimento de padrões, obtendo resultados superiores aos alcançados por outras técnicas de aprendizado em várias aplicações
A Method for Refining Knowledge Rules Using Exceptions
The search for patterns in data sets is a fundamental task in Data Mining, where Machine Learning algorithms are generally used. However, Machine Learning algorithms have biases that strengthen the classifica-tion task, not taking into consideration exceptions. Exceptions contra-dict common sense rules. They are generally unknown, unexpected and contradictory to the user believes. For this reason, exceptions may be interesting. In this work we propose a method to find exceptions out from common sense rules. Besides, we apply the proposed method in a real world data set, to discover rules and exceptions in the HIV virus protein cleavage process.Sociedad Argentina de Informática e Investigación Operativ
- …